27 research outputs found

    Rapid Adaptation of Foreign-accented HMM-based Speech Synthesis

    Get PDF
    This paper presents findings of listeners ’ perception of speaker identity in synthetic speech. Specifically, we investigated what the effect is on the perceived identity of a speaker when using differently accented average voice models and limited amounts (five and fifteen sentences) of a speaker’s data to create the synthetic stimuli. A speaker discrimination task was used to measure speaker identity. Native English listeners were presented with natural and synthetic speech stimuli in English and were asked to decide whether they thought the sentences were spoken by the same person or not. An accent rating task was also carried out to measure the perceived accents of the synthetic speech stimuli. The results show that listeners, for the most part, perform as well at speaker discrimination when the stimuli have been created using five or fifteen adaptation sentences as when using 105 sentences. Furthermore, the accent of the average voice model does not affect listeners ’ speaker discrimination performance even though the accent rating task shows listeners are perceiving different accents in the synthetic stimuli. Listeners do not base their speaker similarity decisions on perceived accent. Index Terms: speech synthesis, rapid adaptation 1

    Cross-lingual acoustic model adaptation for speaker-independent speech recognition

    Get PDF
    Laadukas puheentunnistus vaatii tunnistussysteemiltÀ kykyÀ mukautua puhujan ÀÀneen ja puhetapaan. Suurin osa puheentunnistusjÀrjestelmistÀ on rakennettu kielellisesti yhtenÀisten ryhmien kÀyttöön. Kun erilaisista kielellisistÀ taustoista tulevat ihmiset muodostavat enemmÀn ja enemmÀn kÀyttÀjÀryhmiÀ, tarve lisÀÀntyy tehokkaalle monikieliselle puheentunnistukselle, joka ottaa huomioon murteiden ja painotusten lisÀksi myös eri kielet. TÀssÀ työssÀ tutkittiin, miten englannin ja suomen puheen akustisia malleja voidaan yhdistellÀ ja nÀin rakentaa monikielinen puheentunnistin. TyössÀ tutkittiin myös miten puhuja-adaptaatio toimii nÀissÀ jÀrjestelmissÀ kielten sisÀllÀ ja kielirajan yli niin, ettÀ yhden kielen puhedataa kÀytetÀÀn adaptaatioon toisella kielellÀ. Puheentunnistimia rakennettiin suurilla suomen- ja englanninkielisillÀ puhekorpuksilla ja testattiin sekÀ yksi- ettÀ kaksikielisellÀ aineistolla. Tulosten perusteella voidaan todeta, ettÀ englannin ja suomen akustisten mallien yhdistelemisessÀ turvallisen klusteroinnin raja on niin alhaalla, ettÀ yhdistely ei juurikaan kannata tunnistimen tehokkuuden parantamiseksi. Tuloksista nÀhdÀÀn myös, ettÀ ÀidinkielenÀ puhutun suomen tunnistamista voitiin parantaa kÀyttÀmÀllÀ vieraana kielenÀ puhutun englannin dataa. TÀmÀ mekanismi toimi vain yksisuuntaisesti: Vieraana kielenÀ puhutun englannin tunnistusta ei voinut parantaa ÀidinkielenÀ puhutun suomen datan avulla.For good quality speech recognition, the ability of the recognition system to adapt itself to each speaker's voice and speaking style is more than necessary. Most of speech recognition systems are developed for very specific purposes for a linguistically homogenous group. However, as user groups are formed out of people from differing linguistic backgrounds, there is an ever-growing demand for efficient multi-lingual speech technology that takes into account not only varying dialects and accents but also different languages. This thesis investigated how the acoustic models for English and Finnish can be efficiently combined to create a multilingual speech recognition system. Also how these combined systems perform speaker adaptation within languages and across languages using data from one language to improve recognition of the same speaker speaking another language was investigated. Recognition systems were trained based on large Finnish and English corpora, and tested both on monolingual and bilingual material. This study shows that the thresholds for safe merging of the model sets of Finnish and English are so low that the merging can hardly be motivated from the point of view of efficiency. Also it was found out that the recognition of native Finnish can be improved with the use of English speech data from the same speaker. This only works one-way, as the foreign English recognition could not be significantly improved with the help of Finnish speech data

    Speaker similarity evaluation of foreign-accented speech synthesis using HMM-based speaker adaptation

    Get PDF
    This paper describes a speaker discrimination experiment in which native English listeners were presented with natural and synthetic speech stimuli in English and were asked to judge whether they thought the sentences were spoken by the same person or not. The natural speech consisted of recordings of Finnish speakers speaking English. The synthetic stimuli were created using adaptation data from the same Finnish speakers. Two average voice models were compared: one trained on Finnish-accented English and the other on American-accented English. The experiments illustrate that listeners perform well at speaker discrimination when the stimuli are both natural or both synthetic, but when the speech types are crossed performance drops significantly. We also found that the type of accent in the average voice model had no effect on the listeners' speaker discrimination performance

    Non-game like training benefits spoken foreign-language processing in children with dyslexia

    Get PDF
    Publisher Copyright: Copyright © 2023 Junttila, Smolander, Karhila, Kurimo and Ylinen.Children with dyslexia often face difficulties in learning foreign languages, which is reflected as weaker neural activation. However, digital language-learning applications could support learning-induced plastic changes in the brain. Here we aimed to investigate whether plastic changes occur in children with dyslexia more readily after targeted training with a digital language-learning game or similar training without game-like elements. We used auditory event-related potentials (ERPs), specifically, the mismatch negativity (MMN), to study learning-induced changes in the brain responses. Participants were 24 school-aged Finnish-speaking children with dyslexia and 24 age-matched typically reading control children. They trained English speech sounds and words with “Say it again, kid!” (SIAK) language-learning game for 5 weeks between ERP measurements. During the game, the players explored game boards and produced English words aloud to score stars as feedback from an automatic speech recognizer. To compare the effectiveness of the training type (game vs. non-game), we embedded in the game some non-game levels stripped of all game-like elements. In the dyslexia group, the non-game training increased the MMN amplitude more than the game training, whereas in the control group the game training increased the MMN response more than the non-game training. In the dyslexia group, the MMN increase with the non-game training correlated with phonological awareness: the children with poorer phonological awareness showed a larger increase in the MMN response. Improved neural processing of foreign speech sounds as indicated by the MMN increase suggests that targeted training with a simple application could alleviate some spoken foreign-language learning difficulties that are related to phonological processing in children with dyslexia.Peer reviewe

    The Effects of a Digital Articulatory Game on the Ability to Perceive Speech-Sound Contrasts in Another Language

    Get PDF
    Digital and mobile devices enable easy access to applications for the learning of foreign languages. However, experimental studies on the effectiveness of these applications are scarce. Moreover, it is not understood whether the effects of speech and language training generalize to features that are not trained. To this end, we conducted a four-week intervention that focused on articulatory training and learning of English words in 6-7-year-old Finnish-speaking children who used a digital language-learning game app Pop2talk. An essential part of the app is automatic speech recognition that enables assessing children's utterances and giving instant feedback to the players. The generalization of the effects of such training in English were explored by using discrimination tasks before and after training (or the same period of time in a control group). The stimuli of the discrimination tasks represented phonetic contrasts from two non-trained languages, including Russian sibilant consonants and Mandarin tones. We found some improvement with the Russian sibilant contrast in the gamers but it was not statistically significant. No improvement was observed for the tone contrast for the gaming group. A control group with no training showed no improvement in either contrast. The pattern of results suggests that the game may have improved the perception of non-trained speech sounds in some but not all individuals, yet the effects of motivation and attention span on their performance could not be excluded with the current methods. Children's perceptual skills were linked to their word learning in the control group but not in the gaming group where recurrent exposure enabled learning also for children with poorer perceptual skills. Together, the results demonstrate beneficial effects of learning via a digital application, yet raise a need for further research of individual differences in learning.Peer reviewe

    User Experiences from L2 Children Using a Speech Learning Application : Implications for Developing Speech Training Applications for Children

    Get PDF
    We investigated user experiences from 117 Finnish children aged between 8 and 12 years in a trial of an English language learning programme that used automatic speech recognition (ASR). We used measures that encompassed both affective reactions and questions tapping into the children' sense of pedagogical utility. We also tested their perception of sound quality and compared reactions of game and nongame-based versions of the application. Results showed that children expressed higher affective ratings for the game compared to nongame version of the application. Children also expressed a preference to play with a friend compared to playing alone or playing within a group. They found that assessment of their speech is useful although they did not necessarily enjoy hearing their own voices. The results are discussed in terms of the implications for user interface (UI) design in speech learning applications for children.Peer reviewe
    corecore